国家估计是许多机器人应用中的重要方面。在这项工作中,我们考虑通过增强状态估计算法中使用的动力学模型来获得机器人系统的准确状态估计的任务。现有的框架,例如移动视野估计(MHE)和无气味的卡尔曼过滤器(UKF),为合并非线性动力学和测量模型提供了灵活性。但是,这意味着这些算法中的动力学模型必须足够准确,以保证状态估计的准确性。为了增强动力学模型并提高估计准确性,我们利用了一个深度学习框架,称为基于知识的神经普通微分方程(KNODES)。 KNODE框架将先验知识嵌入到训练过程中,并通过将先前的第一原理模型与神经普通微分方程(NODE)模型融合来合成精确的混合模型。在我们提出的最新框架中,我们将数据驱动的模型集成到两种基于新型模型的状态估计算法中,它们表示为Knode-Mhe和Knode-UKF。在许多机器人应用中,将这两种算法与它们的常规对应物进行了比较。使用部分测量值,地面机器人的定位以及四型二次估计的状态估计。通过使用现实世界实验数据的模拟和测试,我们证明了所提出的学习增强状态估计框架的多功能性和功效。
translated by 谷歌翻译
在这项工作中,我们考虑了在线环境中提高模型预测控制(MPC)动态模型准确性的任务。即使可以学习预测模型并将其应用于基于模型的控制器,但这些模型也经常离线学习。在此离线环境中,首先收集培训数据,并通过详细的培训程序来学习预测模型。将模型训练至所需的精度后,然后将其部署到模型预测控制器中。但是,由于模型是离线学习的,因此它不适合部署过程中观察到的干扰或模型错误。为了提高模型和控制器的适应性,我们提出了一个在线动力学学习框架,该框架不断提高部署过程中动态模型的准确性。我们采用基于知识的神经普通微分方程(KNODE)作为动态模型,并使用受转移学习启发的技术来不断提高模型的准确性。我们通过四型机器人证明了框架的功效,并在模拟和物理实验中验证框架。结果表明,所提出的方法能够说明可能段时间变化的干扰,同时保持良好的轨迹跟踪性能。
translated by 谷歌翻译
在这项工作中,我们考虑使用应用于四逆床控制的模型预测控制(MPC)导出和加入准确动态模型的问题。 MPC依赖于精确的动态模型来实现所需的闭环性能。然而,在复杂系统中存在不确定性以及他们在其运行的环境中的存在在获得对系统动态的充分准确表示方面构成挑战。在这项工作中,我们利用深度学习工具,基于知识的神经常规方程(KNODE),增强了从第一原理获得的模型。由此产生的混合模型包括来自模拟或现实世界实验数据的标称第一原理模型和神经网络。使用四轮压力机,我们将混合模型用于针对最先进的高斯过程(GP)模型,并表明混合模型提供了Quadrotor动态的更准确的预测,并且能够概括超出训练数据。为了提高闭环性能,混合模型集成到新的MPC框架中,称为KNODE-MPC。结果表明,就轨迹跟踪性能而言,综合框架在物理实验中达到了60.2%的仿真和21%以上。
translated by 谷歌翻译
Brain extraction and registration are important preprocessing steps in neuroimaging data analysis, where the goal is to extract the brain regions from MRI scans (i.e., extraction step) and align them with a target brain image (i.e., registration step). Conventional research mainly focuses on developing methods for the extraction and registration tasks separately under supervised settings. The performance of these methods highly depends on the amount of training samples and visual inspections performed by experts for error correction. However, in many medical studies, collecting voxel-level labels and conducting manual quality control in high-dimensional neuroimages (e.g., 3D MRI) are very expensive and time-consuming. Moreover, brain extraction and registration are highly related tasks in neuroimaging data and should be solved collectively. In this paper, we study the problem of unsupervised collective extraction and registration in neuroimaging data. We propose a unified end-to-end framework, called ERNet (Extraction-Registration Network), to jointly optimize the extraction and registration tasks, allowing feedback between them. Specifically, we use a pair of multi-stage extraction and registration modules to learn the extraction mask and transformation, where the extraction network improves the extraction accuracy incrementally and the registration network successively warps the extracted image until it is well-aligned with the target image. Experiment results on real-world datasets show that our proposed method can effectively improve the performance on extraction and registration tasks in neuroimaging data. Our code and data can be found at https://github.com/ERNetERNet/ERNet
translated by 谷歌翻译
Deformable image registration, i.e., the task of aligning multiple images into one coordinate system by non-linear transformation, serves as an essential preprocessing step for neuroimaging data. Recent research on deformable image registration is mainly focused on improving the registration accuracy using multi-stage alignment methods, where the source image is repeatedly deformed in stages by a same neural network until it is well-aligned with the target image. Conventional methods for multi-stage registration can often blur the source image as the pixel/voxel values are repeatedly interpolated from the image generated by the previous stage. However, maintaining image quality such as sharpness during image registration is crucial to medical data analysis. In this paper, we study the problem of anti-blur deformable image registration and propose a novel solution, called Anti-Blur Network (ABN), for multi-stage image registration. Specifically, we use a pair of short-term registration and long-term memory networks to learn the nonlinear deformations at each stage, where the short-term registration network learns how to improve the registration accuracy incrementally and the long-term memory network combines all the previous deformations to allow an interpolation to perform on the raw image directly and preserve image sharpness. Extensive experiments on both natural and medical image datasets demonstrated that ABN can accurately register images while preserving their sharpness. Our code and data can be found at https://github.com/anonymous3214/ABN
translated by 谷歌翻译
大规模数据集上的视觉语言预训练(VLP)在各种下游任务上表现出了首要性能。对于VLP来说,完整且公平的基准(即包括大规模的预训练数据集和各种下游任务)是必不可少的。尽管有很多具有英语语料库的基准,但使用其他语言(例如中文)为VLP建立丰富的基准是一个关键问题。为此,我们为研究界建立了一个称为零的中国跨模式基准,以比较VLP模型。我们发布两个用于下游任务的预训练数据集和五个微调数据集。旁边,我们提出了一个新的预训练前训练框架,用于跨模式学习。具体而言,我们应用全局对比度预级分别学习图像和文本的各个表示。然后,我们通过图像文本交叉编码器和文本图像交叉编码器以细粒度的排名方式融合表示形式。为了进一步增强模型的能力,我们提出了一种由目标引导的蒸馏和特征引导的蒸馏组成的双向蒸馏策略。对于简洁起见,我们将型号r2d2命名。我们在四个公共跨模式数据集和拟议的五个下游数据集上实现最先进的性能。在Flickr30k-CN,可可-CN和Muge进行零射击任务时,与最平均召回的R2D2进行了2.5亿个数据集的R2D2,在2.5亿个数据集中进行了4.7%,5.4%和6.3%的均值改善,而与最新的召回相比艺术。数据集,模型和代码可在https://github.com/yuxie11/r2d2上找到
translated by 谷歌翻译
在实际情况下,代理观察的状态观察可能含有测量误差或对抗性噪音,误导代理人在训练时采取次优行动甚至崩溃。在本文中,我们研究了分布加固学习的培训稳健性〜(RL),一类最先进的方法,即估计整个分布,而不是仅期望的总回报。首先,我们验证了基于期望和分布的Bellman运营商在状态 - Noisy Markov决策过程〜(SN-MDP)中的收缩,该典型表格案例包含随机和对抗状态观察噪声。除了SN-MDP之外,我们将分析基于期望的RL中最小二乘损失的脆弱性,具有线性或非线性函数近似。相比之下,基于直方图密度估计理论地表征分布RL损耗的有界梯度规范。由此产生的稳定梯度,而分布RL的优化占其更好地训练稳健性,而不是国家观察噪声。最后,在游戏套件上进行了广泛的实验,在不同的状态观察噪声的不同强度下,在SN-MDP样设置中验证了基于期望和分布RL的收敛性。更重要的是,与SN-MDP之外的嘈杂设置中,与基于期望的对应物相比,分布RL与嘈杂的状态观察相比,分配RL不易受到噪声的噪声。
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks and avoids inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one-by-one, reducing the memory load and difficulty of performing end-to-end training of deep flow networks. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
translated by 谷歌翻译